Multi-view based unlabeled data selection using feature transformation methods for semiboost learning
نویسندگان
چکیده
SemiBoost [23] is a boosting framework for semi-supervised learning, in which unlabeled data as well as labeled data both contribute to learning. Various strategies have been proposed in the literature to perform the task of selecting useful unlabeled data in SemiBoost. Recently, a multi-view based strategy was proposed in [20], in which the feature set of the data is decomposed into subsets (i.e., multiple views) using a feature-decomposition method. In the decomposition process, the strategy inevitably results in some loss of information. To avoid this drawback, this paper considered feature-transformation methods, rather than using the decomposition method, to obtain the multiple views. More specifically, in the feature-transformation method, a number of views were obtained from the entire feature set using the same number of different mapping functions. After deriving the number of views of the data, each of the views was used for measuring corresponding confidences, for first evaluating examples to be selected. Then, all the confidence levels measured from the multiple views were combined as a weighted average for deriving a target confidence. The experimental results, which were obtained using support vector machines for well-known benchmark data, demonstrate that the proposed mechanism can compensate for the shortcomings of the tradi∗Corresponding Author: Sang-Woon Kim, Department of Computer Engineering, Myongji University, Yongin, 449-728 Korea; Email, [email protected]; Phone, +82-31-3306437; Facsimile, +82-31-335-9998; The work of the first and the second authors was supported by the Human Resources Program in Energy Technology of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea (No. 20154030200770). This work was done as a follow-up study of [20]. Preprint submitted to Neurocomputing January 23, 2017 tional strategies. In addition, the results demonstrate that when the data is transformed appropriately into multiple views, the strategy can achieve further improvement in results in terms of classification accuracy.
منابع مشابه
Unified subspace learning for incomplete and unlabeled multi-view data
Multi-view data with each view corresponding to a type of feature set are common in real world. Usually, previous multi-view learning methods assume complete views. However, multi-view data are often incomplete, namely some samples have incomplete feature sets. Besides, most data are unlabeled due to a large cost of manual annotation, which makes learning of such data a challenging problem. In ...
متن کاملSupervised and Semisupervised Clustering Based on Feature Selection Algorithm Process
In clustering process, semi-supervised learning is a tutorial of contrivance learning methods that make usage of both labeled and unlabeled data for training characteristically a trifling quantity of labeled data with a great quantity of unlabeled data. Semi-supervised learning cascades in the middle of unsupervised learning (without any labeled training data) and supervised learning (with comp...
متن کاملFeature Selection Algorithm for Supervised and Semisupervised Clustering
−In clustering process, semi-supervised learning is a tutorial of contrivance learning methods that make usage of both labeled and unlabeled data for training characteristically a trifling quantity of labeled data with a great quantity of unlabeled data. Semi-supervised learning cascades in the middle of unsupervised learning (without any labeled training data) and supervised learning (with com...
متن کاملOn Combining Side Information and Unlabeled Data for Heterogeneous Multi-Task Metric Learning
Distance metric learning (DML) is critical for a wide variety of machine learning algorithms and pattern recognition applications. Transfer metric learning (TML) leverages the side information (e.g., similar/dissimilar constraints over pairs of samples) from related domains to help the target metric learning (with limited information). Current TML tools usually assume that different domains exp...
متن کاملLearning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views
The focus of this thesis is on learning approaches for what we call “low-quality data” and in particular data in which only small amounts of labeled target data is available. The first part provides background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neurocomputing
دوره 249 شماره
صفحات -
تاریخ انتشار 2017